Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SConstruct : Add Windows libraries for Cycles GPU rendering #5761

Merged
merged 3 commits into from
Apr 8, 2024

Conversation

ericmehl
Copy link
Collaborator

@ericmehl ericmehl commented Apr 2, 2024

This adds support for rendering with CUDA and Optix in Cycles on Windows. I'm not sure it will build successfully until the dependencies have been updated to 8.0.0a10. But when that's ready, this can be converted out of draft status.

Checklist

  • I have read the contribution guidelines.
  • I have updated the documentation, if applicable.
  • I have tested my change(s) in the test suite, and added new test cases where necessary.
  • My code follows the Gaffer project's prevailing coding style and conventions.

@ericmehl ericmehl marked this pull request as ready for review April 2, 2024 23:21
@ericmehl
Copy link
Collaborator Author

ericmehl commented Apr 2, 2024

I added a commit to update to the 8.0.0a10 dependencies, and it looks like it found them as expected. Fingers crossed for CI, clicking Ready to Review now!

@murraystevenson
Copy link
Contributor

Thanks Eric! Unfortunately I've not had much success with this Windows CI build finding the CUDA or Optix devices on my workstation, only the CPU is reported in GafferCycles.devices.

To rule out any driver related mischief I checked both Blender 4.0.2 and Alex's earlier Windows GPU Gaffer builds and they both find the GPU devices, so maybe it's related to the CI build environment if local builds are working on your machine?

@boberfly
Copy link
Collaborator

boberfly commented Apr 3, 2024

#5678

I was a bit concerned at the time when removing the devices method, as this function isn't re-entrant in the past. It might be the case where it is re-entrant in Linux but not in Windows. Worth a try to revert this change and see if devices appear.

@ericmehl
Copy link
Collaborator Author

ericmehl commented Apr 3, 2024

I cleared out all my caches and tried again with the artifacts from the build last night. Cycles picked up CUDA but not Optix. I think I sorted out Optix, no idea how it was getting found earlier, but I fixed an error in my dependencies build and confirmed it's working for me with caches cleared again.

So I'm up to getting Optix and CUDA again on my workstation. I wonder if it's a versions issue? I have CUDA 11.8 and Optix 7.7.0 installed on my system.

@ericmehl
Copy link
Collaborator Author

ericmehl commented Apr 3, 2024

I kicked off a new build of CI with my dependencies package fix. I'll check that again once it's done.

@ericmehl
Copy link
Collaborator Author

ericmehl commented Apr 3, 2024

The latest CI build (dated March 3 instead of March 2 in the archive filename) gives me both Optix and CUDA now.

So partial success!

I'll give it a try on my laptop as well, which hasn't seen any development work in this area and can be considered fresh for this purpose.

@ericmehl
Copy link
Collaborator Author

ericmehl commented Apr 3, 2024

My laptop won't work unfortunately because it doesn't have an NVIDIA card. Might need to rely on @murraystevenson as to test again with the new CI build from today.

@murraystevenson
Copy link
Contributor

I wonder if it's a versions issue? I have CUDA 11.8 and Optix 7.7.0 installed on my system.

Ah, in my case I didn't have the CUDA toolkit installed. Installing CUDA 11.8 then makes the CUDA and Optix devices available on my workstation with the latest CI build, but as mentioned previously this isn't required for Blender or Alex's builds...

@ericmehl
Copy link
Collaborator Author

ericmehl commented Apr 3, 2024

I didn't realize it's possible to run the GPU devices without the toolkit installed. I'll look into what I'm missing.

@boberfly
Copy link
Collaborator

boberfly commented Apr 3, 2024

Yep the toolkit isn't needed if a few conditions are met. The .ptx and .cubin files generated at build time get detected and fed straight into the driver as long as the driver is adequately up to date eg. the cuda/optix version used came out before the driver, or with the driver (cuda toolkit repos on linux tend to have a dkms driver with them). Blender devs tend to use optix 7.3.0, there's a higher chance that older drivers can use it then also but I've been making builds with a much more recent optix, might've been 7.7.0 I can't remember now.

@ericmehl
Copy link
Collaborator Author

ericmehl commented Apr 4, 2024

Ok, I have a pretty good handle on what's been tripping us up getting the GPU devices on Windows. The problem at the center is we weren't getting the right path set in Cycles' CUDADevice::have_precompiled_kernels() method, so it couldn't find the binary compilers.

The patch that Alex put into Cycles itself was getting the executable path ($GAFFER_ROOT/bin), stripping the bin part and using that. But there was one small correction needed to make that work - it was splitting on / but the path on Windows needs to be split on \\.

But investigating that showed that the Cycles cache_path was not getting initialized as expected in IECoreCycles::init(). That would make the splitting bug irrelevant. What I think is happening, check my understanding of linker details here, is :

  1. GafferCycles Python module was calling IECoreCycles::init().
  2. That registered the static cache_path with the cycles_util library IECoreCycles / GafferCycles was linked to.
  3. But the cycles_util library the GafferPython module was linked to never had that registered, so it could not find the devices in devices().

In my latest commit, I register the CYCLES_ROOT value in both modules. Testing this with CUDA and Optix uninstalled on my workstation, which previously caused it to not find the GPU compilers, works as expected with this change.

I also removed the patch from GafferDependencies Cycles since it should not find an empty cache_path string to trigger the logic for finding a path based on the executable path.

Hopefully when the CI build goes through, we can test again and find all the GPU devices...

@johnhaddon
Copy link
Member

That explanation seems completely plausible, and it's great that we can remove the patch from GafferHQ/dependencies. The only problem is that my NVIDIA device still isn't being found with the latest build artifacts from the PR. I've confirmed that it is found by Blender (1.0.2 and 4.1). And I've also switched CyclesOptions.device to "Custom" and entered "CUDA:00", which is enough to get CyclesRenderer in GafferCycles to do its own search of the devices (using the "main" instance of the static library), and that fails as well. Is there anything else I could test that would be useful?

I'm using NVIDIA driver 536.67, with CUDA 12.2.128 driver.

@johnhaddon
Copy link
Member

The new build artifacts after the latest push are working for me now. 🎉

@murraystevenson
Copy link
Contributor

The latest push is also working for me on Windows. 🥳

Copy link
Member

@johnhaddon johnhaddon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job hunting this down @ericmehl. I'm definitely happy to go with this workaround for now, but I mentioned inline one alternative approach that might be better, assuming it works on Windows...

Comment on lines +457 to +459
// Must be initialized to ensure the statically linked Cycles module
// we are linked to can find the binaries for GPU rendering.
ccl::path_init( IECoreCycles::cyclesRoot() );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, anything in IECoreCycles::init() could need this special duplicate treatment, right? And we're just lucky that at the moment there's just some additional logging bits and bobs in there?

This got me wondering - do we even need to link the GafferCycles module to Cycles at all? It already links to libGafferCycles, so can't it get the Cycles symbols from there? I gave that a go here and it seemed to compile and run on Linux at least. Might be worth giving that a shot on Windows too? It'd certainly be less fragile if it works....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this offline and tried a few things to remove the need for linking the library in both modules, but unfortunately none worked. Without the linking the Python module to the various Cycles libs, there are 11 unresolved symbols. We also tried linking GafferCycles with /WHOLEARCHIVE and variants on /WHOLEARCHIVE:cycles_util but none worked, instead generating different unresolved symbols within GafferCycles.

Leaving as-is for now, with the hope that some day there will be a dynamic Cycles library we can link to.

@@ -47,6 +47,10 @@ IECORE_POP_DEFAULT_VISIBILITY
namespace IECoreCycles
{

/// Returns the value of the `CYCLES_ROOT` environment variable.
/// Throws if the variable is not set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is actually returning "" rather than throwing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! I had debated which to do and that comment got stale. Fixed in b5834f6

Because we're linking to static Cycles libraries, any static variables
will be local to that instance of the linked library. By calling
`IECoreCycles::init()`, we were registering the Cycles root directory
with the library linked to `IECoreCycles` / `GafferCycles`. That meant
`GafferCycles` _module_ was not registering the Cycles root, and
GPU devices would fail to initialize because it could not find their
binary compilers. Registering the Cycles root in both modules allows
all Cycles library instances to find the files it needs.
@ericmehl ericmehl merged commit 972b67c into GafferHQ:main Apr 8, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants